Named Entity Recognition in Crime News Documents Using Classifiers Combination

نویسندگان

  • Hafedh Ali Shabat
  • Nazlia Omar
چکیده

The increasing volume of generated crime information readily available on the web makes the process of retrieving and analyzing and use of the valuable information in such texts manually a very difficult task. This work is focus on designing models for extracting crime-specific information from the Web. Thus, this paper proposes an ensemble framework for crime named entity recognition task. The main aim is to efficiently integrating feature sets and classification algorithms to synthesize a more accurate classification procedure. First, three well-known text classification algorithms, namely Naïve Bayes, Support Vector Machine and K-Nearest Neighbor classifiers, are employed as base-classifiers for each of the feature sets. Second, weighted voting ensemble method is used to combine theses three classifiers. To evaluate these models, a manually annotated data set that is obtained from BERNAMA is used. Experimental results demonstrate that using ensemble model is an effective way to combine different feature sets and classification algorithms for better classification performance. The ensemble model achieves an overall F-measure of 89.48% for identifying crime type and 93.36% for extracting crime-related entities. The results of the ensemble model trained with suitable features outperform baseline models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

Cross domain Chinese speech understanding and answering based on named-entity extraction

Chinese language is not alphabetic, with flexible wording structure and large number of domain-specific terms generated every day for each domain. In this paper, a new approach for cross-domain Chinese speech understanding and answering is proposed based on named-entity extraction. This approach includes two parts: a speech query recognition (SQR) part and a speech understanding and answering (...

متن کامل

Named Entity Recognition through Redundancy Driven Classifiers

We present Typhoon, a classifier combination system for Named Entity Recognition (NER), in which two different classifiers are combined to exploit Data Redundancy and Patterns extracted from a large text corpus. Data Redundancy is attained when the same entity occurs in different places in documents, whereas Patterns are 2-grams, 3-grams, 4-grams and 5-grams preceding, and following entities in...

متن کامل

Voted NER System using Appropriate Unlabeled Data

This paper reports a voted Named Entity Recognition (NER) system with the use of appropriate unlabeled data. The proposed method is based on the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) and has been tested for Bengali. The system makes use of the language independent features in the form of different contextual and orthographic wo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015